Textdata Processing with TUSTEP

The "TUebingen System of Text Processing Programs" TUSTEP has been developed by the Department of "Literary and Documentary Data-Processing" at the University of Tuebingen Computing Center. The main purpose in developing TUSTEP was to provide the user with a powerful tool for solving tasks related to scholarly processing textual data by using a minimum of short instructions closely related to the task at hand, instead of being dependent on less appropriate tools such as programming languages.

The work began in 1966 when we first designed a series of functions and subroutines for character and string handling in FORTRAN (compatible, in their first version, to those developed by the Deutsches Rechenzentrum in Darmstadt) and implemented them on the mainframe of the Computing Center of the University of Tübingen. This made programming easier for projects such as the Metrical Analysis of Latin Hexameter Poetry, the Concordance to the Vulgate, or the edition and indexes to the works of Heinrich Kaufringer.

Proceeding from the experiences gained from those projects, the next step in supporting projects was to no longer rely on programming in FORTRAN or other "high level" languages, but to provide a toolbox consisting of programs, each covering one "basic operation" required for processing textual data. The function of each program is controlled by user-supplied parameters; the programs themselves may be combined in a variety of ways, thereby allowing the user to accomplish tasks of the most diversified kind. It was in 1978 when these programs got the name TUSTEP.

We have chosen the term "textdata processing" in order to distinquish between TUSTEP's prime field of application and what is commonly understood by the term text processing or word processing. Naturally, TUSTEP is also equipped with the same functions needed for preparing documents (such as input, editing, formating, printing of text); these functions are required for the documentation and for the preparation of publications within all fields of scholarly work, including both humanities and sciences. However, TUSTEP has been developed in particular to serve those academic fields where the texts themselves are the object of scholarly research: philology, literary studies, linguistics, historical sciences, librarianship: i.e. fields of research where not only new texts are to be produced and published as the result of scholarly work, but where existing texts (including literary texts and historical sources) are to be preserved for the future in the form of new critical editions, are to be analyzed in terms of language, style, contents, or are to be catalogued in bibliographical form.

The basic operations required by those tasks include: Automatic collation of different versions of a text; text correction not only by using an editor, but also in batch mode by means of correction instructions prepared beforehand (by manual transcription, or by program); decomposing texts into elements (e.g. word forms) according to rules provided by the user; building logical enities (e.g. bibliographic records) consisting of more than one line of text; sorting such elements or entities (according to non-latin alphabetical rules and other sorting criteria as well); preparing indexes by building entries from the sorted elements; processing textual data by selecting records or elements, by replacing strings or text parts, by rearranging, completing, compressing and comparing text parts on the basis of rules and conditions provided by the user, by retrieving numerical values (including calendar-dates) which are already given in the text or which can be derived from it (such as the number of words in a paragraph); transforming textual data from TUSTEP files into file formats used by other systems (e.g. SPSS for statistical analysis).

The tasks which can be accomplished with the help of TUSTEP range from composing a brief seminar paper to preparing extensive bibliographies, lexica, indexes, concordances, dictionaries, critcal editions and of course monographs; the final output can be formatted for fotocomposition in a quality one is accustomed to in letterpress printing.

In addition to programs for the aforementioned textdata processing operations, TUSTEP features all necessary organizational functions such as file handling and defining new commands, functions which are normally covered by the job control language (JCL) of the respective operating system (OS). Thus, an identical user interface independent of the computer and its OS is provided. This not only saves the user the trouble of having to relearn when he switches to a computer with a different operating system, but also allows him to adopt existing TUSTEP command sequences unchanged.

TUSTEP is constantly being improved and expanded in order to facilitate solutions for new problems in the field of scholarly textdata processing.

Though primarily developed for the use at the University of Tuebingen, TUSTEP is also available at quite a number of other universities. It is available for IBM-compatible PCs, for workstations and for mainframe computers.

The following list contains a selection of TUSTEP programs for the basic operations of textdata processing and of organizational commands. The names in square brackets are the names of the commands.

1. Basic Operations for Textdata Processing in TUSTEP

INPUT

Import of text data from OCR-readers, omnifont readers (like KDEM, OPTOPUS, OmniPage), PC word processing programs

EDITING

Entering, modifying, replacing and searching textdata using the editor at the data display device [EDIT]

Automatic correction of textdata with correction instructions previously defined by the user, e.g. in cases where an interactive correction with the editor is inefficient [CORRECT]

COMPARING

Comparing different versions of a text; listling and storing differences found [COMPARE]

Listing, in synoptic lines, the basic text and the differences contained in other versions of it [COLLATE]

PROCESSING TEXT

Selecting, substituting, rearranging, supplementing, compressing and comparing text parts based on given rules and conditions; performing mathematical calculations using numbers (including calendar-dates) which are either already given in the text or can be derived from it; output in various formats (including those required for subsequent processing outside of TUSTEP) [COPY]

Replacing short forms contained in a text by full text (words, lines, passages) located in a file and identifiable by corresponding short forms [INSERT]

Maintaining and updating cross-references [NUMBER]

PREPARING INDEXES

Preparing index entries by decomposing text units into their elements, or by extracting marked text parts; if necessary, supplementing and modifying text parts; defining the sort criteria and the sort alphabet, completing the reference; distinguishing between different types of entries [PINDEX]

PRESORTING

Creating sort units by combining text parts which are logically related; defining the sort criteria (selection and sequence of certain parts of the text units); defining the sort values for any string of characters and defining the sort alphabets, which are needed to determine the sequence of the text units in the subsequent sorting [PRESORT]

SORTING

Rearranging the sort units prepared by programs such as PINDEX and PRESORT [SORT]

GENERATING INDEXES AND CONCORDANCES

After SORT has been run, reducing multiple and perhaps hierarchically structured index entries or text units; supplementing and substituting text parts and references; distinguishing between different types of entries; calculating absolute and relative frequencies [GINDEX]

GENERATING LISTINGS

Preparing output for line printers, daisy wheel printers, dot matrix printers, laser printers and microfilm recording devices

- in the form in which the data are recorded in the file; control characters are not interpreted but printed [GLISTING]

- in a format and an arrangement which can be defined by control characters contained in the text, using the entire inventory of typefaces and characters available for the selected printer; with automatic hyphenation and line division as well as other page layout features, including line justification and footnote placement [FORMAT]

- for forms (e.g. address stickers, catalogue cards, standardized letters, office forms) [GFORMS]

TYPESETTING

Converting textdata into a form required for typographic output on PostScript printers or (for professional composition) on a composition device (presently Hell DIGISET, Monotype LASERCOMP, PostScript imagesetters); including automatic line division (with or without line justification and tabular settings) and automatic page makeup with running titles, headings, text, insertions in smaller type, marginal notes, footnotes and up to nine critical apparatuses at page bottom; variety of typefaces and special characters available [COMPOSITION]

The program COMPOSITION is not available for MS-DOS machines.

2. File handling and job control in TUSTEP

DATA TRANSFER

Transfer of textdata from host files (e.g. ASCII files) and conversion to TUSTEP format and vice versa [CONVERT]

FILE MANAGEMENT

Creating, cataloging, opening, closing, renaming, deleting files [CREATE, OPEN, CLOSE, RENAME, ERASE]

USING MAGNETIC TAPE

Reading from and writing to magnetic tape (also used for compatible data transfer between computers having different operating systems); listing the contents of magnetic tapes [MTREAD, MTWRITE, MTCOPY, MTINFORM]

JOB CONTROL

Executing and controlling sequences of commands and programs; defining and executing user-defined commands ("macros") [EXECUTE, MACRO]

3. Learning TUSTEP

TUSTEP is bilingual: it accepts commands in German and in English and responds in the language used for the latest command.

There is also a (preliminary) English translation of the (German) TUSTEPuser's manual This manual is not meant to be a teach-yourself text; it is a reference guide for those acquainted with the basic TUSTEP functions.
A beginners manual (Lernbuch TUSTEP , Tübingen: Niemeyer 1995, XII+384 pages, 36.80 DM, ISBN 3-484-73019-6) is presently available in German.
We advise beginners to take one of our courses (in German) offered during the semester breaks. There are two such courses: a 1-week introductory course (5 hours a day, plus exercises) given in March and September. This course deals with file handling and other control commands, plus the use of the TUSTEP editor and the other TUSTEP programs required for entering, correcting, searching, formatting and printing texts. The second cours , lasting 2 weeks (5 hours a day, plus exercises), is given in the second half of September. It covers the remaining TUSTEP commands and teaches the user how to solve complex problems with the help of TUSTEP.
In October 1993, in Würzburg/Germany the International TUSTEP User Group (ITUG) has been founded as a forum of information and communication for TUSTEP users. An electronic information service (www, FTP, gopher; the respective addresses are: http://www.germanistik.uni-wuerzburg.de/itug.html or gopher.germanistik.uni-wuerzburg.de or ftp.germanistik.uni-wuerzburg.de) offers information on new features contained in TUSTEP, on courses and other meetings, gives access to sample solutions and useful procedures. Surface mail address: ITUG, c/o Universität Würzburg, Deutsches Seminar, Am Hubland, D-97074 Würzburg, Fax +49-931-888-4616.
To facilitate the exchange of information between scholars who use (or plan to use) computers in the humanities and the staff of the Department for Literary and Documentary Data Processing, the Colloquium on the Use of Electronic Data Processing in the Humanities at the University of Tübingen was established in 1973 and is held three times a year at the University of Tübingen Computing Center. The reports of these Colloquia are published in the journal Literary and Linguistic Computing (prior to 1985: ALLC-Bulletin
Revision: January 1996

TUSTEP has been developed at the Computing Centre of the University of Tübingen Abt. Literarische und Dokumentarische Datenverarbeitung, Brunnenstrasse 27, D-72074 Tübingen
E-mail: tustep@zdv.uni-tuebingen.de, Tel. +49-7071-2972933, 2972901, Fax +49-7071-295912

... link to TUSTEP homepage

tustep@zdv.uni-tuebingen.de - Stand: 13. Januar 1997

Textdata Processing with TUSTEP

Introduction

Basic Operations for Textdata Processing in TUSTEP

File Handling and Job Control in TUSTEP

Learning TUSTEP

Introduction

1. Basic Operations for Textdata Processing in TUSTEP

2. File handling and job control in TUSTEP

3. Learning TUSTEP

Revision: January 1996

TUSTEP has been developed at the Computing Centre of the University of Tübingen Abt. Literarische und Dokumentarische Datenverarbeitung, Brunnenstrasse 27, D-72074 Tübingen
E-mail: tustep@zdv.uni-tuebingen.de, Tel. +49-7071-2972933, 2972901, Fax +49-7071-295912

Textdata Processing with TUSTEP

Revision: January 1996

TUSTEP has been developed at the Computing Centre of the University of Tübingen Abt. Literarische und Dokumentarische Datenverarbeitung, Brunnenstrasse 27, D-72074 Tübingen E-mail: tustep@zdv.uni-tuebingen.de, Tel. +49-7071-2972933, 2972901, Fax +49-7071-295912

TUSTEP has been developed at the Computing Centre of the University of Tübingen Abt. Literarische und Dokumentarische Datenverarbeitung, Brunnenstrasse 27, D-72074 Tübingen
E-mail: tustep@zdv.uni-tuebingen.de, Tel. +49-7071-2972933, 2972901, Fax +49-7071-295912